Network visualization (in R) with “netplot” and motif counting (in C++) with “barry”

SCI Seminar

George G. Vega Yon, Ph.D.

Division of Epidemiology

University of Utah

2023-04-07

Whoami

  • Research Assistant Professor of Epidemiology.

  • Ph.D. in Biostatistics from USC and M.Sc. in Economics from Caltech.

  • Methodologist working at the intersection between Statistical Computing and Complex Systems Modeling.

Network visualization with netplot

You can download the slides from
ggv.cl/slides/sci2023

netplot In a nutshell

  • What: An R package for network visualization inspired by Gephi.

  • Why: Opinionated way to visualize graphs.

  • Where: You can get the dev version on GitHub (USCCANA/netplot) or the stable version on CRAN.

Main features

  • Visualization engine: The grid system (same used by ggplot2.)

  • Layout algorithms: Default uses igraph’s layout.

  • Vertex sizes: Relative to the drawing area.

Example code+output

The personal friendship network of a faculty of a UK university, consisting of 81 vertices (individuals) and 817 directed and weighted connections. The school affiliation of each individual is stored as a vertex attribute. This dataset can serve as a testbed for community detection algorithms.

How?

library(netplot)
library(igraph)

library(igraphdata)
data("UKfaculty")

# Vertex colors f(group)
vcols <- V(UKfaculty)$Group 
vcols <- palette.colors(
  n = length(unique(vcols))
)[vcols]

set.seed(323)
# Netplot call
nplot(
  UKfaculty,
  edge.line.breaks = 20,
  vertex.color     = vcols
  )

Challenges and Next steps

  • Speed up the code: grid objects can be computationally expensive to build.

  • Porter Bischof (Undergrad from UVU) will contribute and present at the INSNA Sunbelt conference (flagship conference of SNA).

Counting motifs with barry

barry in a nutshell

  • What: A C++ header-only template library for motif counting (and more.)

  • Why: Implement Discrete Exponential Family Models [DEFMs] for phylogenetics and social networks analysis.

  • Where: You can get it on GitHub (USCBiostats/barry)

Main features

About 11 K lines of C++ code built for statistical modeling:

  • Motif count using change statistics (we will return to that.)

  • Full and constrained enumeration of 0/1 arrays.

  • Computes probability function for Discrete Exponential-Family Models [DEFMs].

  • Memory and computationally efficient for pooled models.

Change statistics

  • Change statistics are at the core of ERGMs (Exponential-Family Random Graph Models).

  • Two great applications:(i) make counting easy and (ii) can be used for sampling from ERGM likelihood function.

Change statistics formals

  • The change statistic is defined as a real-valued vector where the \(k\)-th entry equals the observed change when the \(ij\)-th tie is removed from the network. Formally:

    \[ \delta(y_{ij}: 0\to 1) = s(\mathbf{y})_{ij}^+ - s(\mathbf{y})_{ij}^- \]

    Where \(s(\cdot)\) is a function returning graph \(\mathbf{y}\)’s observed statistics, and \(s(\mathbf{y})_{ij}^+\) represents its value when \(y_{ij} = 1\).

Formals 2

\[\begin{equation} \mbox{logit}\left({\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) }\right) = {\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right), \end{equation}\]

with \(\delta\left(y_{ij}:0\to 1\right)\equiv s\left(\mathbf{y}\right)_{\mbox{ij}}^+ - s\left(\mathbf{y}\right)_{\mbox{ij}}^-\) as the vector of change statistics, in other words, the difference between the

\[\begin{equation} {\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) } = \frac{1}{1 + \mbox{exp}\left\{-{\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right)\right\}} \end{equation}\]

Examples of change statistics

Let’s look into the change statistics edgecount, triangles, and gender-homophily when we remove tie 33-69.

Using ergm

s() y- y+ change
Edgecount 816 817 1
Triangles 5366 5399 33
Group-homophily 664 665 1

Current implemented models

  • Exponential-Family Random Graph Models [ERGMs].

  • DEFMs for multiple correlated outcomes (Markov Random Fields; on development with Drs. MJ Pugh and Tom Valente.)

  • Motif counting applied to counting imaginary motifs in Cognitive Social Structures [CSS] (with Dr. Kyosuke Tanaka, submitted to Social Networks).

  • Modeling the evolution of gene functions in terms of transition between functional states (research grant submitted to National Human Genome Research Institute NHGRI).

ERGMs

  • A fundamental feature of pooled models (multiple graphs/arrays).
  • A single model may feature thousands of networks.
  • But if all have the same number of nodes (and other features)… we only need to enumerate once.

Final words

Today’s talk

  • The netplot R package for graph visualization.

  • barry: Your go-to motif accountant.

Other projects

fmcmc | ergmito | aphylo | netdiffuseR | ABCoptim
slurmR | barry | rgexf | rgexf

Thanks!